Sentiment Analysis

from Cambridge University studies


Sentiment analysis aims to quantify the emotion within a text. In this project various sentiment analysis tools are compared for the purpose of classifying movie reviews.
First a lexicographical approach was used, using a dictionary of positive and negative words to score a sentence. This exposed a bias towards positive language present in the dataset (shown above). Then unigram, bigram and trigram approaches were tested, which consider the likelihood of a word or a combination of words appearing in a positive or negative review. Finally document embeddings were used, a Doc2Vec model was trained on the dataset as well as pre-trained BERT and DistilBERT implementations being tested. Overall Doc2Vec provided the best results, although the BERT/DistilBERT architectures would likely perform better if they were trained for this specific task.

Download the Report